3  Results

3.1 Loading the data

Code
energy_data_annual <- read_xlsx(path='./data_source/combined_annual_data.xlsx')
colnames(energy_data_annual) <- sapply(colnames(energy_data_annual), function(var) {
  trimmed <- str_replace_all(var, "\\.x", "")
  trimmed <- str_replace_all(trimmed, "\\.y", "")
  return(trimmed)
})

energy_data_annual$Year <- as.integer(energy_data_annual$Year)
energy_data_annual <- energy_data_annual[, !duplicated(colnames(energy_data_annual))]
energy_data_annual
Code
energy_data_monthly <- read_xlsx(path='./data_source/combined_monthly_data.xlsx')
colnames(energy_data_monthly) <- sapply(colnames(energy_data_monthly), function(var) {
  trimmed <- str_replace_all(var, "\\.x", "")
  trimmed <- str_replace_all(trimmed, "\\.y", "")
  return(trimmed)
})

energy_data_monthly$Month <- as.Date(energy_data_monthly$Month)
energy_data_monthly <- energy_data_monthly[, !duplicated(colnames(energy_data_monthly))]
energy_data_monthly

3.2 Energy Production and Consumption Overview

3.2.1 Primary Energy Production

Code
ggplot(energy_data_annual, aes(x=Year)) + 
  geom_line(aes(y=`Total Fossil Fuels Production (Quadrillion Btu)`, color='Fossil Fuels Production'), size=1) +
  geom_line(aes(y=`Nuclear Electric Power Production (Quadrillion Btu)`, color='Nuclear Power Production'), size=1) +
  geom_line(aes(y=`Total Renewable Energy Production (Quadrillion Btu)`, color='Renewable Energy Production'), size=1) +
  labs(
    title='Primary Energy Production',
    x = 'Year',
    y = 'Production (Quadrillion Btu)',
    caption = 'Data Source: U.S. Energy Information Association',
  ) + 
  theme(
    plot.title = element_text(hjust=0.5, face='bold', color='darkblue'),
    legend.position = 'bottom',
    legend.box = 'horizontal',
    legend.title = element_blank()
  ) +
  scale_color_manual(values = c('Fossil Fuels Production' = 'red', 
                                'Nuclear Power Production' = 'blue', 
                                'Renewable Energy Production' = 'green'))

The graph illustrates the trend in primary energy production from 1950 to 2020. Initially, there is a steady increase from approximately 28 quadrillion Btu in 1950 to 59 quadrillion Btu by 1970. This is followed by a plateau in production from 1970 to 2010. After 2010, there is a noticeable spike in production, which may be attributed to advancements in high-performance computing in large data centers which needs high energy.

Fossil Fuels Production : This line shows a significant increase over the years, indicating a substantial rise in energy production. It suggests that this energy source has been the dominant contributor to primary energy production.

Nuclear and Renewable Energy Production : These lines remain relatively flat compared to the fossil fuel, indicating that these energy sources have contributed less to the overall primary energy production. They show slight increases over time but are not as pronounced as the fossil fuel.

3.2.2 Primary Energy Consumption

Code
ggplot(energy_data_annual, aes(x=Year)) +
  geom_line(aes(y=`Total Fossil Fuels Consumption (Quadrillion Btu)`, color='Fossil Fuels Consumption'), size=1) +
  geom_line(aes(y=`Nuclear Electric Power Consumption (Quadrillion Btu)`, color='Nuclear Power Consumption'), size=1) +
  geom_line(aes(y=`Total Renewable Energy Consumption (Quadrillion Btu)`, color='Renewable Energy Consumption'), size=1) +
  labs(
    title='Primary Energy Consumption',
    x='Year',
    y='Consumption (Quadrillion Btu)',
    caption = 'Data Source: U.S. Energy Information Association'
  ) + 
  scale_color_manual(
    values = c(
      'Fossil Fuels Consumption' = 'red',
      'Nuclear Power Consumption' = 'blue',
      'Renewable Energy Consumption' = 'green'
    )
  ) +
  theme(
    plot.title = element_text(hjust=0.5, face='bold', color='darkblue'),
    legend.position = 'bottom',
    legend.box = 'horizontal',
    legend.title = element_blank()
  )

Fossil Fuels Consumption : This is the dominant source of energy consumption throughout the period. There is a steady increase from 1950 to around 2005, with some fluctuations. After 2005, the consumption plateaus with minor ups and downs.

Nuclear Power Consumption : This energy consumption starts to become little significant around the late 1960s and early 1970s. It shows gradual growth until about 2000, after which it stabilizes.

Renewable Energy Consumption : This energy consumption begins to rise noticeably in the late 1990s. It shows a steady increase, especially post-2000, and appears to be catching up with nuclear power by the end of the period.

The fossil fuels make a large chunk of energy consumption throughout the years. The other energy consumption source like nuclear and renewable has very little contribution. There is a serious need of investments in these energy sources in order to catch up or reduce the dependence of fossil fuels.

3.2.3 Primary Energy Imports and Exports

Code
ggplot(energy_data_annual, aes(x=Year, y=`Primary Energy Net Imports (Quadrillion Btu)`)) + 
  geom_bar(stat = 'identity', fill='orange', color='black') + 
  labs(
    title='Primary Energy Net Imports',
    x = 'Year',
    y = 'Energy (Quadrillion Btu)',
    caption = 'Data Source: U.S. Energy Information Associaton'
  ) + 
  theme(
    plot.title = element_text(hjust=0.5, face='bold', color='darkblue'),
  )

The chart shows the net imports of primary energy into the United States over time.

Observations :

  • 1950s to Early 1970s: The net energy imports were relatively low and stable. This period shows minimal dependency on energy imports.

  • Mid-1970s to Early 1980s: There was a noticeable increase in net energy imports, likely due to rising energy demands and geopolitical events affecting oil supply. Since the 1970s, the global oil trade has been predominantly conducted in U.S. dollars (USD), creating a symbiosis between America’s currency and the world’s most traded commodity. The petrodollar emerged as an economic concept in the 1970s as growing U.S. imports of increasingly costly crude oil increased the dollar holdings of foreign producers.

  • 1980s to Early 2000s: A significant rise in net imports occurred, peaking around the mid-2000s. This reflects increased energy consumption and reliance on foreign energy sources. The U.S. experienced growing energy demands driven by economic expansion and technological advancements. This led to higher consumption of oil and natural gas. The U.S. became increasingly reliant on foreign oil, with imports rising significantly.

  • Mid-2000s to Present: There is a sharp decline in net imports, eventually turning negative. This indicates that the U.S. became a net exporter of primary energy. Factors contributing to this include increased domestic energy production (especially from shale gas and oil), improved energy efficiency, and shifts towards renewable energy sources. (Source: U.S. Energy Independence)

Overall, the chart illustrates a transition from high dependency on imported energy to a position where the U.S. exports more energy than it imports.

3.2.4 Energy Imports vs Energy Consumption

Code
ggplot(energy_data_monthly, aes(x=Month)) + 
  geom_line(aes(y=`Primary Energy Imports (Quadrillion Btu)`, color='Primary Energy Imports'), size=0.5) +
  geom_line(aes(y=`Total Primary Energy Consumption (Quadrillion Btu)`, color='Total Primary Energy Consumption'), size=0.5) +
  labs(
    title = 'Energy Consumption and Energy Imports',
    x='Timeline',
    y='Energy (Quadrillion Btu)',
    caption = 'Data Source: U.S. Energy Information Association'
  ) +
  theme(
    plot.title = element_text(hjust=0.5, face='bold', color='darkblue'),
    legend.position = 'bottom',
    legend.box = 'horizontal',
    legend.title = element_blank()
  )

Code
ggplot(energy_data_annual, aes(x = `Primary Energy Imports (Quadrillion Btu)`, y = `Total Primary Energy Consumption (Quadrillion Btu)`)) +
  geom_point(color = "blue", size = 1) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(
    title = "Energy Dependency Analysis",
    x = "Primary Energy Imports (Quadrillion Btu)",
    y = "Total Primary Energy Consumption (Quadrillion Btu)",
    subtitle = paste("Pearson Correlation Coefficient:", round(cor(energy_data_annual$`Primary Energy Imports (Quadrillion Btu)`, energy_data_annual$`Total Primary Energy Consumption (Quadrillion Btu)`, method = "pearson"), 2)),
    caption = 'Data Source: U.S. Energy Information Association'
  ) +
  theme(
    plot.title = element_text(hjust=0.5, face='bold', color='darkblue'),
    plot.subtitle =  element_text(hjust=0.5, color='purple')
  )
`geom_smooth()` using formula = 'y ~ x'

The two graphs provide a comprehensive analysis of energy consumption and import patterns over time and it reveals important trends and relationships.

Consumption Trends
Total Primary Energy Consumption shows a steady upward trajectory from around 5 Quadrillion Btu to approximately 7.5 Quadrillion Btu each month starting from January 1973 and ending on August 2024. Notable seasonal fluctuations appear throughout the timeline with regular peaks and troughs. The overall consumption pattern demonstrates consistent growth despite short-term variations

Import Patterns
Primary Energy Imports started at roughly 1.5 Quadrillion Btu each month in the 1970s. Imports peaked around 2005-2010 at approximately 3 Quadrillion Btu each month. A notable decline in imports occurred after 2010, stabilizing at about 2 Quadrillion Btu each month by 2020.

Statistical Relationship
The scatter plot reveals a strong positive correlation between imports and consumption. The Pearson Correlation Coefficient of 0.95 indicates an extremely strong linear relationship between consumption and import. The narrow confidence interval (gray shading) suggests high prediction reliability. The regression line shows a clear positive slope, indicating that higher imports generally correspond to higher consumption. Data points cluster tightly around the regression line, particularly in the middle range. The relationship remains consistent across different levels of imports and consumption.

Key Insights
- Despite growing total energy consumption, there’s a decreasing reliance on imports in recent years.
- The gap between consumption and imports has widened over time, suggesting increased domestic energy production or diversification of energy sources.
- The seasonal variations in consumption are more pronounced than fluctuations in imports, indicating stable import patterns despite varying demand.

3.3 Sectorwise Energy Consumption Analysis

3.3.1 Energy Overview by Residential Sector

Code
rs_energy_consumed <- xts(x = energy_data_monthly$`Total Energy Consumed by the Residential Sector (Trillion Btu)`, order.by = energy_data_monthly$Month)
rs_energy_loss <- xts(x = energy_data_monthly$`Residential Sector Electrical System Energy Losses (Trillion Btu)`, order.by = energy_data_monthly$Month)

dygraph(cbind(rs_energy_consumed, rs_energy_loss), main='Energy Consumed vs Energy Loss (Residential Sector)') |>
  dySeries('rs_energy_consumed', label = 'Energy Consumed') |>
  dySeries('rs_energy_loss', label = 'Energy Loss') |>
  dyRangeSelector() |>
  dyOptions(stackedGraph = TRUE, drawPoints = TRUE, pointSize = 2) |>
  dyAxis("x", label = "Timeline") |>
  dyAxis("y", label = "Energy (Trillion Btu)")

The green line represents energy consumption, which follows a distinct seasonal pattern. Energy use regularly peaks at around 3,500 trillion BTU and drops to about 2,000 trillion BTU throughout each year. These peaks likely correspond to winter months when heating demands are highest, while valleys represent periods of lower energy usage, typically during milder seasons.

The blue line shows energy loss, which mirrors the consumption pattern but at a significantly lower level. Energy losses typically fluctuate between 500 and 1,000 trillion BTU. This parallel pattern suggests that energy losses are directly proportional to consumption – when more energy is consumed, more is lost through various inefficiencies.

3.3.2 Energy Overview by Commercial Sector

Code
rs_energy_consumed <- xts(x = energy_data_monthly$`Total Energy Consumed by the Commercial Sector (Trillion Btu)`, order.by = energy_data_monthly$Month)
rs_energy_loss <- xts(x = energy_data_monthly$`Commercial Sector Electrical System Energy Losses (Trillion Btu)`, order.by = energy_data_monthly$Month)

dygraph(cbind(rs_energy_consumed, rs_energy_loss), main='Energy Consumed vs Energy Loss (Commercial Sector)') |>
  dySeries('rs_energy_consumed', label = 'Energy Consumed') |>
  dySeries('rs_energy_loss', label = 'Energy Loss') |>
  dyRangeSelector() |>
  dyOptions(stackedGraph = TRUE, drawPoints = TRUE, pointSize = 2) |>
  dyAxis("x", label = "Timeline") |>
  dyAxis("y", label = "Energy (Trillion Btu)")

The green line represents the total energy consumed by the commercial sector, measured in trillion BTU (British Thermal Units). Starting from around 1,000 trillion BTU in the 1970s, consumption steadily increased to peak at approximately 2,500 trillion BTU in the mid-2000s. After 2010, the consumption pattern shows more fluctuation but generally maintains a high level around 2,000 trillion BTU.

The blue line shows energy losses, which are significantly lower than consumption but follow a similar upward trend. Losses started at about 300 trillion BTU in the 1970s and gradually increased to around 700-800 trillion BTU by the 2010s. These losses likely represent energy wasted through inefficient systems, heat escape, and conversion processes.

Both lines show regular up-and-down patterns throughout each year, indicating seasonal variations in energy use. These zigzag patterns are more pronounced in the consumption (green) line, suggesting that commercial energy use is heavily influenced by seasonal factors like heating in winter and cooling in summer.

Toward the end of the graph (2015-2020s), both consumption and losses show a slight downward trend, possibly reflecting improved energy efficiency in the commercial sector or changes in business operations. The gap between consumption and loss has remained relatively consistent in recent years, suggesting that efficiency ratios have stabilized.

3.3.3 Energy Overview by Industrial Sector

Code
rs_energy_consumed <- xts(x = energy_data_monthly$`Total Energy Consumed by the Industrial Sector (Trillion Btu)`, order.by = energy_data_monthly$Month)
rs_energy_loss <- xts(x = energy_data_monthly$`Industrial Sector Electrical System Energy Losses (Trillion Btu)`, order.by = energy_data_monthly$Month)

dygraph(cbind(rs_energy_consumed, rs_energy_loss), main='Energy Consumed vs Energy Loss (Industrial Sector)') |>
  dySeries('rs_energy_consumed', label = 'Energy Consumed') |>
  dySeries('rs_energy_loss', label = 'Energy Loss') |>
  dyRangeSelector() |>
  dyOptions(stackedGraph = TRUE, drawPoints = TRUE, pointSize = 2) |>
  dyAxis("x", label = "Timeline") |>
  dyAxis("y", label = "Energy (Trillion Btu)")

The green line at the top shows how much energy industries actually used. The amount typically stays between 2,500 and 3,500 trillion Btu (British thermal units). There are some interesting patterns:
- A peak around the year 2000, reaching about 3,500 trillion Btu.
- Some noticeable dips, particularly around 1985 and 2008 due to strong global competition and the global recession respectively. - Regular up and down patterns that might represent seasonal changes.

The blue line at the bottom shows energy that was wasted or lost during use. This line is much lower, staying around 500 trillion Btu. The loss is fairly consistent over time, though it shows:
- A slight increase from 1980 to 2000.
- A gradual decrease after 2000.
- Much smaller variations compared to the consumption line.

The relationship between energy consumed and lost has remained relatively stable over these 40 years. The gap between the green and blue lines represents the energy that was successfully used for industrial purposes. This shows that industries typically lose about 15-20% of their energy during use, while successfully using about 80-85% of it.

The graph reveals that while industrial energy use has fluctuated over time, the efficiency of energy use (shown by the relatively stable loss rate) hasn’t changed dramatically over this period. The graph also shows much less seasonal variations as compared to energy graph of residential and commercial sectors.

3.3.4 Energy Overview by Transportation Sector

Code
rs_energy_consumed <- xts(x = energy_data_monthly$`Total Energy Consumed by the Transportation Sector (Trillion Btu)`, order.by = energy_data_monthly$Month)
rs_energy_loss <- xts(x = energy_data_monthly$`Electrical System Energy Losses Proportioned to the Transportation Sector (Trillion Btu)`, order.by = energy_data_monthly$Month)

dygraph(cbind(rs_energy_consumed, rs_energy_loss), main='Energy Consumed vs Energy Loss (Transportation Sector)') |>
  dySeries('rs_energy_consumed', label = 'Energy Consumed') |>
  dySeries('rs_energy_loss', label = 'Energy Loss') |>
  dyRangeSelector() |>
  dyOptions(stackedGraph = TRUE, drawPoints = TRUE, pointSize = 2) |>
  dyAxis("x", label = "Timeline") |>
  dyAxis("y", label = "Energy (Trillion Btu)")

The green line, representing energy consumed, shows a general upward trend from about 1,500 trillion Btu in 1980 to approximately 2,500 trillion Btu by 2020. This indicates a significant increase in energy consumption in the transportation sector over this period.

  • The graph displays regular seasonal fluctuations throughout the years.
  • A notable peak occurs around 2007-2008, reaching about 2,500 trillion Btu.
  • A sharp drop is visible around 2020, likely corresponding to the global pandemic.
  • Recovery appears to occur after the 2020 drop, returning to previous levels.

The blue line at the bottom of the graph, representing energy loss, remains remarkably constant and close to zero throughout the entire period. This suggests that the transportation sector has maintained consistent energy efficiency levels despite increasing consumption.

The graph shows regular up-and-down patterns within each year, indicating seasonal changes in transportation energy use. These fluctuations appear to be fairly consistent in amplitude throughout the measured period.

This visualization effectively demonstrates how transportation energy demands have grown substantially over the past 40 years, while energy loss has remained minimal. The data suggests improvements in energy efficiency technologies have helped maintain low energy losses despite increasing consumption levels.

3.4 Statewise Energy Production and Cost Analysis

Code
statewise <- read.csv('./data_source/Statewise_Energy_Production_Consumption_Cost.csv')
statewise <- statewise[, -c(3,5,7)]
statewise <- statewise |> rename(
  `Production_pecent` = `Production..U.S..Share`,
  `Consumption_per_capita` = `Consumption.per.Capita..Million.Btu`,
  `Expenditutre_per_capita` = `Expenditures.per.Capita..Dollars`
)

us_states <- ne_states(country = "United States of America", returnclass = "sf")
us_states <- us_states %>% select(name, iso_3166_2)
us_states <- us_states %>% mutate(State = substring(iso_3166_2, first=4, last=5)) |>
  filter(State %in% c("AK", "HI") | !State %in% c("GU", "VI", "MP", "AS", "PR"))

statewise <- us_states %>% inner_join(statewise, by = "State")
statewise
Code
ggplot(data = (statewise)) +
  geom_sf(aes(fill = Production_pecent), color = "black") +
  scale_fill_distiller(palette = "Reds", direction=1, name = "Energy Production %", 
                       na.value = "grey50") +
  geom_sf_text(aes(label = State), size = 3, color = "black") +
  coord_sf(xlim = c(-180, -50), ylim = c(15, 75)) +
  theme_void() + 
  labs(
    title = "Energy Production % by US Mainland, Alaska, and Hawaii") +
  theme(
    plot.title = element_text(face='bold', hjust=0.5, color='darkblue'),
  )

This choropleth map displays the percentage distribution of energy production across the United States, including Alaska and Hawaii. The color gradient ranges from light pink (lowest percentage) to dark red (highest percentage), with values ranging from 0% to 25%.

Texas stands out prominently in dark red, contributing approximately 25% of the nation’s total energy production. The map reveals a concentration of energy production in the central United States, particularly in the South-Central region. Most coastal states and the Northeast show relatively lower energy production percentages, as indicated by their lighter shading.

  • The Western states generally show minimal energy production.
  • Alaska and Hawaii, shown separately from the mainland, display low production levels.
  • The Midwest and Eastern seaboard states contribute relatively small percentages to the national total.

The visualization effectively demonstrates the dominance of Texas in U.S. energy production while highlighting the significant regional disparities in energy production across the country.

Code
ggplot(data = (statewise)) +
  geom_sf(aes(fill = Expenditutre_per_capita), color = "black") +
  scale_fill_distiller(palette = "Blues", direction=1, name = "Expenditure per capita", 
                       na.value = "grey50") +
  geom_sf_text(aes(label = State), size = 3, color = "black") +
  coord_sf(xlim = c(-180, -50), ylim = c(15, 75)) +
  theme_void() + 
  labs(
    title = "Energy Expenditure per Capita in US Mainland, Alaska, and Hawaii") +
  theme(
    plot.title = element_text(face='bold', hjust=0.5, color='darkblue'),
  )

This map visualization shows the energy expenditure per capita across the United States, including Alaska and Hawaii. The data is represented through a color gradient, where darker teal/green indicates higher expenditure and lighter beige/brown shows lower expenditure.

Regional Patterns Alaska stands out with the highest energy expenditure per capita, shown in deep teal color, suggesting spending of more than $12,000 per person.

Mainland US shows varying levels of expenditure:
- The Western states generally show higher per capita spending, particularly in darker brown shades.
- The Midwest displays moderate expenditure levels in lighter brown tones.
- Several Northern states, particularly in the Mountain region, show spots of light blue, indicating different spending patterns.
- The Southeast region demonstrates moderate to high expenditure levels.

The expenditure scale ranges from approximately:
- Highest (teal): > $12,000.
- Mid-range (tan): $6,000-8,000.
- Lower range (light beige): $4,000.
This visualization effectively highlights the geographic disparities in energy spending across the United States, with notable variations between regions and individual states.